The **`pgvector`** extension in PostgreSQL is used to efficiently store and query vector data. The **`pgvector`** extension provides
PostgreSQL with the ability to store and perform operations on vectors directly within the database.
Nile supports **`pgvector`** out of the box on the latest version - `0.8.0`.

Pgvector lets you store and query vectors directly within your usual Postgres database - with the rest of your data. This is both convenient and efficient. It supports:

- Exact and approximate nearest neighbor search (with optional HNSW and IVFFlat indexes)
- Single-precision, half-precision, binary, and sparse vectors
- L2 distance, inner product, cosine distance, L1 distance, Hamming distance, and Jaccard distance
- Any language with a Postgres client
  Plus ACID compliance, point-in-time recovery, JOINs, and all of the other great features of Postgres

## Create tenant table with vector type

Vector types work like any other standard types. You can make them the type of a column in a tenant table and Nile will take care of isolating the embeddings per tenant.

```sql
-- creating a table to store wiki documents for a Notion like
-- SaaS application with vector dimension of 3
CREATE TABLE wiki_documents(
    tenant_id uuid,
    id integer,
    embedding vector(3)
    );
```

## Store vectors per tenant

Once you have the table defined, you would want to populate the embeddings. Typically, this is done by querying a large language model (eg. OpenAI, HuggingFace), retrieving the embeddings and storing them in the vector store. Once stored, the embeddings follow the standard tenant rules. They can be isolated, sharded and placed based on the tenant they belong to.

```sql
INSERT INTO wiki_documents (tenant_id,id, embedding)
VALUES ('018ade1a-7843-7e60-9686-714bab650998',1, '[1,2,3]');
```

## Query vectors

Pgvector supports 6 types of vector similarity operators:

<table>
  <thead>
    <tr>
      <th>Operator</th>
      <th>Name</th>
      <th>Description</th>
      <th>Use Cases</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>&lt;-&gt;</td>
      <td>vector_l2_ops</td>
      <td>
        L2 distance. Measure of the straight-line distance between two points in
        a multi-dimensional space. It calculates the length of the shortest path
        between the points, which corresponds to the hypotenuse of a right
        triangle.
      </td>
      <td>
        Used in clustering, k-means clustering, and distance-based
        classification algorithms
      </td>
    </tr>
    <tr>
      <td>&lt;#&gt;</td>
      <td>vector_ip_ops</td>
      <td>
        Inner product. The inner product, also known as the dot product,
        measures the similarity or alignment between two vectors. It calculates
        the sum of the products of corresponding elements in the vectors.
      </td>
      <td>
        Used in similarity comparison or feature selection. Note that for
        normalized vectors, inner product will result in the same ranking as
        cosine distance, but is more efficient to calculate. So this is a good
        choice if you use an embedding algorith that produces normalized vectors
        (such as OpenAI's){' '}
      </td>
    </tr>
    <tr>
      <td>&lt;=&gt;</td>
      <td>vector_cosine_ops</td>
      <td>
        Cosine distance. Cosine distance, often used as cosine similarity when
        measuring similarity, quantifies the cosine of the angle between two
        vectors in a multi-dimensional space. It focuses on the direction rather
        than the magnitude of the vectors.
      </td>
      <td>
        Used in text similarity, recommendation systems, and any context where
        you want to compare the direction of vectors
      </td>
    </tr>
    <tr>
      <td>&lt;+&gt;</td>
      <td>vector_l1_ops</td>
      <td>
        L1 distance. The L1 distance, also known as the Manhattan distance,
        measures the distance between two points in a grid-like path (like a
        city block). It is the distance between two points measured along axes
        at right angles.
      </td>
      <td>
        Less sensitive to outliers than L2 distance and according to some
        research, better for high-dimensional data.
      </td>
    </tr>
    <tr>
      <td>&lt;~&gt;</td>
      <td>bit_hamming_ops</td>
      <td>
        Hamming distance. The Hamming distance measures the number of positions
        at which the corresponding symbols are different.
      </td>
      <td>
        Used with binary vectors. Mostly for discrete data like categories. Also
        used for error-correcting codes and data compression.
      </td>
    </tr>
    <tr>
      <td>&lt;%&gt;</td>
      <td>bit_jaccard_ops</td>
      <td>
        Jaccard distance. Measures similarity between sets by calculating the
        ratio of the intersection to the union of the two sets (how many
        positions are the same out of the total positions).
      </td>
      <td>
        Used with binary vectors. Useful for comparing customers purchase
        history, recommendation systems, similarites in terms used in different
        texts, etc.
      </td>
    </tr>
  </tbody>
</table>

You could use any one of them to find the distance between vectors. The choice of operator depends not only on the use-case, but also on
the model used to generate the embeddings. For example, OpenAI's models return normalized embeddings, so using inner product or cosine distance
will give the same results and inner product is more efficient. However, some models return non-normalized embeddings, so cosine distance should be used.

Real world vectors are quite large - 768 to 4096 dimensions are not uncommon. But for the sake of the example, we'll use small vectors:

To get the L2 distance

```sql
SELECT embedding <-> '[3,1,2]' AS distance FROM wiki_documents;
```

For inner product, multiply by -1 (since `<#>` returns the negative inner product)

```sql
SELECT (embedding <#> '[3,1,2]') * -1 AS inner_product FROM wiki_documents;
```

For cosine similarity, use 1 - cosine distance

```sql
SELECT 1 - (embedding <=> '[3,1,2]') AS cosine_similarity FROM wiki_documents;
```

## Vector Indexes

`pgvector` supports two types of indexes:

- HNSW
- IVFFlat

Keep in mind that vector indexes are unlike other database indexes. They are used to perform efficient **approximate** nearest neighbor searches.

Without vector indexes (also called **flat indexes** by other vector stores), queries sequentially scan through all vectors for the given query,
and compute the distance to the query vector. This is computationally expensive, but guarantees to find the nearest neighbors
(also called **exact nearest neighbors** or **KNN**).

With vector indexes, the query will search a subset of the vectors that is expected to contain the nearest neighbors (but may not contain all of them).
This is computationally efficient, but the results are not guaranteed to be the nearest neighbors.

When using vector indexes, you can control the trade-off between speed and recall by specifying the index type and parameters.

### HNSW

An HNSW index creates a multilayer graph. It has slower build times and uses more memory than IVFFlat, but has better query performance
(in terms of speed-recall tradeoff). There’s no training step like IVFFlat, so the index can be created without any data in the table.

When creating HNSW, you can specify the maximum number of connections in a layer (`m`) and the number of candidate vectors considered
when building the graph (`ef_construction`). More connections and more candidate vectors will improve recall but will increase build time and memory.
If you don't specify the parameters, the default values are `m = 16` and `ef_construction = 64`.

Add an index for each distance function you want to use.

```sql
CREATE INDEX ON wiki_documents USING hnsw (embedding vector_l2_ops);
```

Inner product

```sql
CREATE INDEX ON wiki_documents USING hnsw (embedding vector_ip_ops);
```

Cosine distance

```sql
CREATE INDEX ON wiki_documents USING hnsw (embedding vector_cosine_ops);
```

Specifying the parameters

```sql
CREATE INDEX ON wiki_documents USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 100);
```

While querying, you can specify the size of the candidate list that will be searched (`hnsw_ef`):

```sql
SET hnsw_ef = 100;
SELECT * FROM wiki_documents ORDER BY embedding <=> '[3,1,2]' LIMIT 10;
```

Vectors with up to 2,000 dimensions can be indexed.

### IVFLAT

An IVFFlat index divides vectors into lists, and then searches a subset of those lists that are closest to the query vector.
It has faster build times and uses less memory than HNSW, but has lower query performance (in terms of speed-recall tradeoff).

Three keys to achieving good recall are:

1. Create the index **after** the table has some data
2. Choose an appropriate number of lists - a good place to start is `rows / 1000` for up to 1M rows and `sqrt(rows)` for over 1M rows.
3. When querying, specify an appropriate number of probes (higher is better for recall, lower is better for speed) - a good place to start is `sqrt(lists)`

Add an index for each distance function you want to use.

L2 distance

```sql
CREATE INDEX ON wiki_documents USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);
```

Inner product

```sql
CREATE INDEX ON wiki_documents USING ivfflat (embedding vector_ip_ops) WITH (lists = 100);
```

Cosine distance

```sql
CREATE INDEX ON wiki_documents USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
```

When querying, you can specify the number of probes (i.e. how many lists to search):

```sql
SET ivfflat_probes = 100;
SELECT * FROM wiki_documents ORDER BY embedding <=> '[3,1,2]' LIMIT 10;
```

More probes will improve recall but will slow down the query.

Vectors with up to 2,000 dimensions can be indexed.

## Filtering

Typically, vector search is used to find the nearest neighbors, which means that you would limit the number after ordering by distance:

```sql
SELECT * FROM wiki_documents ORDER BY embedding <=> '[3,1,2]' LIMIT 10;
```

It is a good idea to also filter by distance, so you won't include vectors that are too far away, even if they are the nearest neighbors.
This will prevent you from getting nonsensical results in cases where there isn't much data in the vector store.

```sql
SELECT * FROM wiki_documents
WHERE embedding <=> '[3,1,2]' < 0.9
ORDER BY embedding <=> '[3,1,2]'
LIMIT 10;
```

Nile will automatically limit the results to only the vectors that belong to the current tenant:

```sql
SET nile.tenant_id = '018ade1a-7843-7e60-9686-714bab650998';

SELECT * FROM wiki_documents
WHERE embedding <=> '[3,1,2]' < 0.9
ORDER BY embedding <=> '[3,1,2]'
LIMIT 10;
```

And you can also filter results by additional criteria (some vector stores call this "metadata filtering" or "post-filtering"):

```sql
SELECT * FROM wiki_documents
WHERE embedding <=> '[3,1,2]' < 0.9
AND category = 'product'
ORDER BY embedding <=> '[3,1,2]'
LIMIT 10;
```

It is recommended to add indexes on the columns used for filtering, so the query can be optimized.
Especially when the use of the filter can lead to small enough result sets that a sequential scan is faster than use of the vector index.
This will optimize not just performance but also recall.

With approximate indexes, filtering is applied after the index is scanned. If a condition matches 10% of rows,
with HNSW and the default hnsw.ef_search of 40, only 4 rows will match on average.

Starting in version `0.8.0`, you can enable **iterative index scans**, which will automatically scan more of the index when needed.

### Iterative Index scans

Using iterative index scans, Postgres will scan the approximate index for nearest neighbors, apply additional filters and, if the number of
neighbors after filtering is insufficient, it will continue scanning until sufficient results are found.

Each index has its own configuration (GUC) for iterative scans: `hnsw.iterative_scan` and `ivfflat.iterative_scan`.
By default both configurations are set to `off`.

HNSW indexes support both relaxed and strict ordering for the iterative scans. Strict order guarantees that the returned results are ordered by exact distance.
Relaxed order allows results that are slightly out of order, but provides better recall (i.e. fewer missed results due to the approximate
nature of the index).

```sql
SET hnsw.iterative_scan = strict_order;
-- or
SET hnsw.iterative_scan = relaxed_order;
```

IVFFlat indexes only allow relaxed ordering:

```sql
SET ivfflat.iterative_scan = relaxed_order;
```

Once you set these configs, you don't need to change your existing queries. You should immediately see the same queries return the correct number of results.

However, if you use relaxed ordering, you can re-order the result using materialized CTE:

```sql
WITH relaxed_results AS MATERIALIZED (
    SELECT id, embedding <-> '[1,2,3]' AS distance FROM items WHERE category_id = 123 ORDER BY distance LIMIT 5
) SELECT * FROM relaxed_results ORDER BY distance;
```

If you filter by distance (recommended, to avoid nonsense results in case there aren't many similar vectors), it is recommended to use
materialized CTE and place the filter outside the CTE in order to avoid overscanning:

```sql
WITH nearest_results AS MATERIALIZED (
    SELECT id, embedding <=> '[1,2,3]' AS distance FROM items ORDER BY distance LIMIT 5
) SELECT * FROM nearest_results WHERE distance < 1 ORDER BY distance;
```

Even with iterative scans, pgvector limits the index scan in order to balance time, resource use and recall.
Increasing these limits can increase query latency but potentially improve recall:

HSNW has a configuration that controls the total number of rows that will be scanned by a query across all iterations (20,000 is the default):

```sql
SET hnsw.max_scan_tuples = 20000;
```

IVFFLat lets you configure the maximum number of lists that will be checked for nearest neighbors:

```sql
SET ivfflat.max_probes = 100;
```

## Quantization

_Introduced in pgvector 0.7.0_

Quantization is a technique of optimizing vector storage and query performance by using fewer bits to store the vectors.
By default, pgvector's Vector type is in 32-bit floating point format. The `halfvec` data type uses 16-bit floating point format, which has the following benefits:

- Reduced storage requirements (half the memory)
- Faster query performance
- Reduced index size (both in disk and memory)
- Can index vectors with up to 4096 dimensions (which covers the most popular embedding models)

To use `halfvec`, you can create a table with the `halfvec` type:

```sql
CREATE TABLE wiki_documents(
    tenant_id uuid,
    id integer,
    embedding halfvec(3) -- put the real number of dimensions here
    );
```

You can also create quantized indexes on regular sized vectors:

```sql
CREATE INDEX ON wiki_documents USING hnsw ((embedding::halfvec(3)) halfvec_l2_ops);
```

In this case, you need to cast the vector to `halfvec` in the query:

```sql
SELECT * FROM wiki_documents WHERE embedding::halfvec(3) <=> '[3,1,2]' LIMIT 10;
```

Note that to create an index on `halfvec`, you need to specify the distance function as `halfvec_l2_ops` or `halfvec_cosine_ops`.

## Sparse Vectors

_Introduced in pgvector 0.7.0_

Sparse vectors are vectors in which the values are mostly zero. These are common in text search algorithms, where each dimension represents
a word and the value represents the relative frequency of the word in the document - BM25, for example. Some embedding models, such as BGE-M3, also use sparse vectors.

Pgvector supports sparse vector type `sparsevec` and the associated similarity operators.
Because sparse vectors can be extremely large but most of the values are zero, pgvector stores them in a compressed format.

```sql
CREATE TABLE wiki_documents(
    tenant_id uuid,
    id integer,
    embedding sparsevec(5) -- put the real number of dimensions here
    );

INSERT INTO wiki_documents (tenant_id, id, embedding)
VALUES ('018ade1a-7843-7e60-9686-714bab650998', 1, '{1:1,3:2,5:3}/5');

SELECT * FROM wiki_documents ORDER BY embedding <-> '{1:3,3:1,5:2}/5' LIMIT 5;
```

The format is `{index1:value1,index2:value2,...}/N`, where N is the number of dimensions and the indices start from 1 (like SQL arrays).
Because the format is a bit unusual, it is recommended to use [pgvector's libraries for your favorite language](https://github.com/pgvector/pgvector?tab=readme-ov-file#languages)
to insert and query sparse vectors.

## Summary

You can read more about pgvector on their [github](https://github.com/pgvector/pgvector/blob/master/README.md)

If you have any feedback or questions on building AI-native SaaS applications on Nile, please do reach out on our [Github discussion forum](https://github.com/orgs/niledatabase/discussions) or our [Discord community](https://discord.gg/8UuBB84tTy).
